577 research outputs found

    Speaker diarization of multi-party conversations using participants role information: political debates and professional meetings

    Get PDF
    Speaker Diarization aims at inferring who spoke when in an audio stream and involves two simultaneous unsupervised tasks: (1) the estimation of the number of speakers, and (2) the association of speech segments to each speaker. Most of the recent efforts in the domain have addressed the problem using machine learning techniques or statistical methods (for a review see [11]) ignoring the fact that the data consists of instances of human conversations

    Predicting continuous conflict perception with Bayesian Gaussian processes

    Get PDF
    Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach that detects common conversational social signals (loudness, overlapping speech, etc.) and predicts the conflict level perceived by human observers in continuous, non-categorical terms. The proposed regression approach is fully Bayesian and it adopts Automatic Relevance Determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception

    Annotation and detection of conflict escalation in political debates

    Get PDF
    Conflict escalation in multi-party conversations refers to an increase in the intensity of conflict during conversations. Here we study annotation and detection of conflict escalation in broadcast political debates towards a machine-mediated conflict management system. In this regard, we label conflict escalation using crowd-sourced annotations and predict it with automatically extracted conversational and prosodic features. In particular, to annotate the conflict escalation we deploy two different strategies, i.e., indirect inference and direct assessment; the direct assessment method refers to a way that annotators watch and compare two consecutive clips during the annotation process, while the indirect inference method indicates that each clip is independently annotated with respect to the level of conflict then the level conflict escalation is inferred by comparing annotations of two consecutive clips. Empirical results with 792 pairs of consecutive clips in classifying three types of conflict escalation, i.e., escalation, de-escalation, and constant, show that labels from direct assessment yield higher classification performance (45.3% unweighted accuracy (UA)) than the one from indirect inference (39.7% UA), although the annotations from both methods are highly correlated (rďż˝=0.74 in continuous values and 63% agreement in ternary classes)

    Oxidative stress and epigenetic regulation in ageing and age-related diseases

    Get PDF
    Recent statistics indicate that the human population is ageing rapidly. Healthy, but also diseased, elderly people are increasing. This trend is particularly evident in Western countries, where healthier living conditions and better cures are available. To understand the process leading to age-associated alterations is, therefore, of the highest relevance for the development of new treatments for age-associated diseases, such as cancer, diabetes, Alzheimer and cardiovascular accidents. Mechanistically, it is well accepted that the accumulation of intracellular damage determined by reactive oxygen species (ROS) might orchestrate the progressive loss of control over biological homeostasis and the functional impairment typical of aged tissues. Here, we review how epigenetics takes part in the control of stress stimuli and the mechanisms of ageing physiology and physiopathology. Alteration of epigenetic enzyme activity, histone modifications and DNA-methylation is, in fact, typically associated with the ageing process. Specifically, ageing presents peculiar epigenetic markers that, taken altogether, form the still ill-defined “ageing epigenome”. The comprehension of mechanisms and pathways leading to epigenetic modifications associated with ageing may help the development of anti-ageing therapies

    Infinite Models for Speaker Clustering

    Get PDF
    In this paper we propose the use of infinite models for the clustering of speakers. Speaker segmentation is obtained trough a Dirichlet Process Mixture (DPM) model which can be interpreted as a flexible model with an infinite a priori number of components. Learning is based on a Variational Bayesian approximation of the infinite sequence. DPM model is compared with fixed prior systems learned by ML/BIC, MAP/BIC and a Variational Bayesian method. Experiments are run on a speaker clustering task on the NIST-96 Broadcast News database

    Infinite Models for Speaker Clustering

    Get PDF
    In this paper we propose the use of infinite models for the clustering of speakers. Speaker segmentation is obtained trough a Dirichlet Process Mixture (DPM) model which can be interpreted as a flexible model with an infinite a priori number of components. Learning is based on a Variational Bayesian approximation of the infinite sequence. DPM model is compared with fixed prior systems learned by ML/BIC, MAP/BIC and a Variational Bayesian method. Experiments are run on a speaker clustering task on the NIST-96 Broadcast News database

    An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

    Full text link

    Discriminant linear processing of time-frequency plane

    Get PDF
    Extending previous works done on considerably smaller data sets, the paper studies linear discriminant analysis of about 30 hours of phoneme-labeled speech data in the time-frequency domain. Analysis is carried both independently in time and frequency and jointly. Data driven spectral basis show similar frequency sensitivity as human hearing. LDA-derived temporal FIR filters are consistent with temporal lateral inhibition. Considerable improvement is obtained using first temporal discriminant

    Application of Out-Of-Language Detection To Spoken-Term Detection

    Get PDF
    This paper investigates the detection of English spoken terms in a conversational multi-language scenario. The speech is processed using a large vocabulary continuous speech recognition system. The recognition output is represented in the form of word recognition lattices which are then used to search required terms. Due to the potential multi-lingual speech segments at the input, the spoken term detection system is combined with a module performing out-of-language detection to adjust its confidence scores. First, experimental results of spoken term detection are provided on the conversational telephone speech database distributed by NIST in 2006. Then, the system is evaluated on a multi-lingual database with and without employment of the out-of-language detection module, where we are only interested in detecting English terms (stored in the index database). Several strategies to combine these two systems in an efficient way are proposed and evaluated. Around 7% relative improvement over a stand-alone STD is achieved
    • …
    corecore